Feasibility of Interpretable Machine Learning-based
Species Distribution Models: A case of Cormorants in South Korean Rivers
Cheongok Jeon1, Junbeom Bahk2
1The Institute for Korean Regional Studies; LophitaL@snu.ac.kr
2Seoul National University; nongsong@snu.ac.kr
Saebyeol Yu. Saebyeols PowerPoint
Table
Introduction
Method
Result
Discussion
Conclusion
Species distribution model; SDM
> Models for estimating the spatial distribution of species
based on the relationship between the species and environmental variables (Elith and Leathwick, 2009)
Major tool for biogeographic researches as GIS-based environmental information is available
>> Introduction: SDM?
Part 1
https://damariszurell.github.io/SDM-Intro/
>>
Part 1
Model data is usually built by spatial extraction
For highly mobile species, if categorical variables are used
without “transformation”, presence information may not mean
“preference”.
The composition of categorical variables around observed
locations, rather than just at observed locations, can be useful for
capturing the real preference.
Necessity of proper conversion
of categorical variables
Introduction: Issue
A proper transformation is required to reflect the spatial
scale recognized by animals rather than directly utilizing
the data themselves (Guisan and Thuiller, 2005)
>>
Part 1
SDM studies mainly use numerical data on a macro-scale
(e.g. continental unit).
Categorical variables can also affect the spatial distribution
of species but are not used properly due to difficulties in
transformation.
Visually developed taxa, such as aves and primates, can
mainly use the visual landscape for habitat selection.
Estimation of micro-scale spatial distribution
using landscape variables
© Hans Stielglitzd, © Dennis Jarvis
Introduction: Issue
>>
Part 1
Machine learning algorithms have
excellent predictive performance due to
their
high flexibility
, but are difficult to
interpret due to their complex function
Prediction accuracy-model interpretability trade-off
Need to supplement biogeographical
research that requires interpretation of
ecology(Ryo, 2020)
Attempts to reduce trade-offs and to fit
models that are both predictive and
interpretable by using interpretable
machine learning methods
https://damariszurell.github.io/EEC-MGC/index.html
Introduction: Research Methods
>>
Part 1
Phalacrocorax carbo
Nearly cosmopolitan diving waterbird
In 1998, some populations were reported to have
changed as resident birds for the first time(Song, et al. 2017)
Discussion on the bird control since 2020(Park, 2020)
Mainly found in isolated islands in large rivers, reservoirs,
and the sea
Quantitative research to understand the ecology is
insufficient
Phalacrocorax carbo
Phalacrocorax pelagicus
National Institute of Biological Resources
search date:2023-04-21
Introduction: Research Subject
Population Trends of cormorants in South Korea
year
population the number of observation points
>>
Part 2
대충 기술통계 막대그래프
(시간대~출현 자료수)
환경변수 막대그래
Presence data: 2566 data, 15 sites, 2019-2022
Methods: Species occurrences
The number of presence data
Monitoring year
Geomorphology Vegetation
WATer; WAT
W
etland Grass Veg
; WGV
River ISland; RIS
W
etland Wood Veg
; WWV
ArtIficial
Structure
; AIS
Dryland Grass Veg; DGV
FlooDPlain; FDP Dryland Wood Veg; DWV
SanD Dune; SDD
>> Methods: Environment variables
Part 2
Geomorphic and vegetation
landscape variables
3km upstream and 5km
downstream for every 15 barrages
Aerial and drone photography
with field survey
2-dimensional categorical
variables in spatial polygon
>> Methods: Variable Transformation
Part 2
Each variable area ratio within the potential landscape perception range was
given as a pixel value
perception rages:150m, 250m, 400m and 500m
Moving window
Transformation
>> Methods: Modelling
Part 2
Maximum Entropy; MaxEnt (Philips et al., 2004; 2006)
> No absence data: The study area is limited to near site to the rivers,
which causes the difficulty of determining unsuitable regions
> Estimation of relative habitat suitability using background data representative of a given environment
We used SHapley Additive exPlanation(SHAP) capable of interpreting each presence data(Lundberg and Lee, 2017)
> Interpretation of our model results using interpretable machine learning method
>> Results: Model Validation
Part 3
Since there is no absence data for accurate
validation, we validated the models with a
reliability index rather than discriminative index
The reliability is estimated to be high as the
ratio of presence data giving higher fitness
predictions than random background data is
higher
Boyce index of 0.9 or higher(high reliability)
(Hirzel et al., 2006)
In the 250, 500m models, decreases were
confirmed at certain sections, but remained
high overall
Habitat suitability
Perception range
presence/expected ratio
>> Results: Model Prediction
Part 3
high habitat suitability: where WWV and RIS are adjacent or overlapped
low habitat suitability: where an area of a specific variable except WAT is too wide
WWV
WAT
RIS
(A) Ipobo
(B) Sejongbo
(C) Gumibo
>> Results: Interpretation(Mean SHAP)
Part 3
Mean SHAP: the average value of the absolute values of
the Shapley value for the presence data to determine the
variable importance
Relatively consistent importance with the exception of
the 150m model
WAT, FDP and RIS show high importance
WWV contributed relatively consistent
regardless of the models
Implies that appropriate perception ranges can be applied
per variable
WAT
FDP
RIS
DGV
WWV
SDD
AIS
perception range
Mean SHAP
>> Discussion: Interpretation(SHAP)
Part 4
a variable that reflects food sources
If the perception range is reduced, the
area ratio must increase to get the same
contribution = mainly inhabits a body of
water with a certain area
The water area is also related to the water
depth on the river centerline, implying
food sources and river ice
WAT(water)
image: Flaticon.com
Each point indicates the degree and direction of habitat suitability contribution estimated from
the presence data (Y axis) and the value of variables (X axis)
area ratio
perception range
>> Discussion: Interpretation(SHAP)
Part 4
a variable that reflects resting spots and
microhabitat
Area ratios required for positive contributions
are similar among all models = prefer nearby
river islands with proper ‘ratio’
An overly wide island contributes negatively,
which can refer to narrow bodies of water and
shallow depths
Even if the island is absent, the negative
contribution is not large, so substitutes may
exist
However, if present, it can be actively
occupied and make a high contribution
image: Flaticon.com
RIS(river island)
area ratio
perception range
>> Discussion: Interpretation(SHAP)
Part 4
a variable that represents breeding place
positive contribution to habitat suitability
regardless of area or ratio
Locally distributed in the study area, as the
perception range increases, the high-ratio
area decreases, but shows a high positive
contribution
WWV(wetland wood vegetation)
image: Flaticon.com
area ratio
perception range
>> Discussion
Part 4
A high ratio of WWV can mitigate the negative
contribution of small water areas
It implies necessity to separate and analyze
breeding sites and food sources, rather than
simply interpreting habitats
Interaction
WAT-WWV
image: Flaticon.com
area ratio of WAT
Shapley value of WAT
area ratio of WWV
>> Discussion
Part 4
Small artificial infrastructures near water
bodies have a positive contribution
less avoidance of artificial structures
Fishways, embankments, and bridge
structures are often used as resting places
(Cho, S. R. and Choi, H. I., 2018)
There will be no significant impact even in case
of no AIS
AIS with no WAT is interpreted as a low
contribution
Interaction
AIS-WAT
image: Flaticon.com
area ratio of AIS
Shapley value of AIS
area ratio of WAT
>> Discussion
Part 4
Occupied as a resting place, similar to AIS, but
with a much more beneficial, high positive
contribution
SDD near WAT make a positive contribution to
habitat suitability, but its absence doesn’t lead
to negative impacts
Interaction
SDD-WAT
image: Flaticon.com
area ratio of AIS
Shapley value of AIS
area ratio of WAT
>> Discussion
Part 4
FDP is negatively correlated with WAT
Too wide FDP is a negative factor due to the difficulty of hunting
WWV near WAT is preferred but doesn’t show positive
contribution in case of long distance between WWV and WAT
interaction
FDP-WAT, WWV
area ratio of FDP
area ratio of WAT
Shapley value of FDP
area ratio of FDP
area ratio of WWV
Shapley value of FDP
Shapley value of FDP
Shapley value of FDP
>> Discussion
Part 4
Wetland wood vegetation
River island
Waterbody
Resting place
Proper ratio of island and
water is important
Island boundaries are
mainly used
Breeding place
Proper wood area that can
be used as population
breeding
Mitigating the shortcoming
of narrow water body
Narrow area is also used
Floodplain
Artificial infrastructure
Substitute for
river island
Only those close
to water are used
The absence is
not critical
Food source
Related to water depth and freezing
Need for certain wide area
The most important variable
The main interaction variable with other variables
substitute
Not significant or negative to
habitat suitability
image: Flaticon.com
>> Conclusion
Part 5
Prediction and interpretation of habitat characteristics of cormorants
with geomorphic and vegetation variables
> We suggested that species distribution models can be appropriately fitted at local scales using landscape variables
> We interpreted the landscape use pattern in terms of distance and area ratio
> Using the various perception ranges of a species, we showed that it is important to consider variable landscape use
according to life activities such as spawning and feeding
We suggest the possibility of rich ecological interpretation as well as accurate prediction
> Possible to identify complicated ecological interactions, degree and direction of contribution beyond PDP
and permutation-based variable importance
> Using various machine learning algorithms and data, it is possible to explore hypotheses that were not previously
recognized through exploratory model analysis
> Quantitative ecological analysis that supplements qualitative ecological analysis based on expert observations
becomes possible, providing grounds for species conservation policies and increasing academic availability
>> Conclusion
Part 5
Simulation-based quantitative verification is required,
and the model needs to be improved using continuous monitoring data
Depending on the local context, the range of landscape perception may appear differently
for each barrage
Reference
1. Cho, S. R., & Choi, H. I. (2018) Avifauna and Maintenance Strategy of Gyeongpo Lake, Gangneung.
Korean Journal of Nature Conservation
, 17(1), 45-54.
2. Elith, J., & Leathwick, J. R. (2009) Species distribution models: ecological explanation and prediction across space and time.
Annual review of ecology, evolution, and
systematics
, 40, 677-697.
3. Guisan, A., & Thuiller, W. (2005) Predicting species distribution: offering more than simple habitat models.
Ecology letters
, 8(9), 993-1009.
4. Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006) Evaluating the ability of habitat suitability models to predict species presences.
Ecological modelling
, 199(2)
, 142-152.
5. Lundberg, S. M., & Lee, S. I. (2017) A unified approach to interpreting model predictions.
Advances in neural information processing systems
, 30.
6. Park, J. (2020). 단양강 점령한 가마우지… “물고기 7.5kg 꿀꺽, 그물도 훼손”.
연합뉴스
. https://www.yna.co.kr/view/AKR20200615077100064.
7. Phillips, S. J., Dudík, M., & Schapire, R. E. (2004) A maximum entropy approach to species distribution modeling.
In Proceedings of the twenty-first international conference o
n Machine learning.
655-662.
8. Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006) Maximum entropy modeling of species geographic distributions.
Ecological modelling
, 190(3-4), 231-259.
9. Ryo, M., Angelov, B., Mammola, S., Kass, J. M., Benito, B. M., & Hartig, F. (2021) Explainable artificial intelligence enhances the ecological interpretability of blackbox species
distribution models.
Ecography
, 44(2), 199-205.
10. Song, H. S., Byeon, J. S., Jo, G. R., Park, J. H., Jeong, J. M., & Park, H. U. (2017) 팔당호에서 민물가마우지의 어류 섭식에 관한 연구.
한국환경과학회 학술발표회 발표논문집,
11, 183-
183.
Thank you!
Jeon, Cheongok1, Bahk, Junbeom2
1The Institute for Korean Regional Studies, Seoul National University, Seoul 08826; LophitaL@snu.ac.kr
2Department of Geography, Seoul National University, Seoul 08826; nongsong@snu.ac.kr
@biogeojeon
cheongokjeon@gmail.com
sleepershark91
nongsong@snu.ac.kr